Biochemistry 158/258

A Human Genome Stained by Fluourescent in Situ Hybridization

Genomics, Bioinformatics & Medicine

Doug Brutlag

Functional Genomics Project

1) Select a protein from UniProt concerning a disease of interest to you. This does not have to be a protein involved in the disease you presented in the 2nd assignment. Please do choose a protein with a well-understood biological function. You can check the biological functions of your protein choice by examining the Gene Ontology terms in the UniProt entry for your protein.

2) Search your protein for homology using the UniProt BLAST method to search Uniprot/SwissProt. Do NOT search all of UniProt. Please report two or three hits that are both statistically and biologically significant. Also report two or three hits that you think are neither statistically nor biologically significant. If your protein family is very large, you may have to ask BLAST to return more hits to find statistically insignificant hits. For very common proteins with large families you may have to use NCBI BLAST which will let you report up to 20,000 results. Please explain in full paragraphs your decision as to the statistical or biological significance of each finding. See homework hints to find out how to do this.

3) Search your protein for motifs with the MyHits Motif Scan Query. Be sure to INCLUDE Prosite patterns, frequent patterns, Prosite Profiles, more profiles and Pfam Local HMMs in your search. Please send me the MyHits that you think are biologically significant and at least 1 or 2 hits that you think are not biologically significant. You judge biological significance by comparing the function of the pattern hits with the gene ontology terms associated with your query. Be sure to include high frequency patterns in order to be able to discover some biologically insignificant hits.  The high frequency patterns are NOT protein functions, but instead represent possible protein modification sites. The profiles and HMM hits will have expectation values associated with them which will help you determine significance.

4) Search your protein for blocks using the InterPro. Please send me a few of the InterPro domains hits you think are significant. Interpro sets its significance threshold quite high so often it does not return any insignificant hits. If this is the case for your search, please state that no insignificant hits were found.

Please remember that this is a research report, not a list of short answers. You should write full paragraphs on each of the topics above. I am more interested in your interpretation of the results than the results themselves so be sure to describe in detail your evaluation of the statistical and biological significance of each result.

Hints on how to do this Project! (PDF)

© Doug Brutlag 2015